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Astronomical researchers often think of analysis and visualization as separate tasks. In the case of high-dimensional data 
sets, though, interactive exploratory data visualization can give far more insight than an approach where data processing 
and statistical analysis are followed, rather than accompanied, by visualization. This paper attempts to charts a course 
toward "linked view" systems, where multiple views of high-dimensional data sets update live as a researcher selects, 
highlights, or otherwise manipulates, one of several open views. For example, imagine a researcher looking at a 3D volume 
visualization of simulated or observed data, and simultaneously viewing statistical displays of the data set's properties 
(such as an x-y plot of temperature vs. velocity, or a histogram of vorticities). Then, imagine that when the researcher 
selects an interesting group of points in any one of these displays, that the same points become a highlighted subset in 
all other open displays. Selections can be graphical or algorithmic, and they can be combined, and saved. For tabular 
(ASCII) data, this kind of analysis has long been possible, even though it has been under-used in Astronomy. The bigger 
issue for Astronomy and several other "high-dimensional" fields is the need systems that allow full integration of images 
and data cubes within a linked-view environment. The paper concludes its history and analysis of the present situation 
with suggestions that look toward cooperatively-developed open-source modular software as a way to create an evolving, 
flexible, high-dimensional, linked-view visualization environment useful in astrophysical research. 
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1 Introduction 

Historically, Astronomy has been a visual science. Thou- 
sands of years ago observations were carried out with the 
naked eye; hundreds of years ago telescopes augmented the 
eye; and during the last century sensitive film and CCD 
recording devices enhanced what the eye could see. More 
recently, observing techniques spanning the full electro- 
magnetic spectrum have been developed, as have techniques 
for statistical comparison with analytic and numerical the- 
oretical predictions. Oddly though, as Astronomy's wave- 
length coverage increased, the value of the "visual" to as- 
tronomers seems to have declined-not as a wavelength, but 
as a tool. Too often, wavelength- specific studies of tiny 
patches of sky, or statistical analyses of tremendous cata- 
logs of information, are carried out with very little atten- 
tion paid to context. Viewing what surrounds a tiny narrow- 
field image, or studying a catalog's content in context on a 
wide-field sky often gives unexpected and valuable infor- 
mation. Understanding the context of catalog data in high- 
dimensional spaces where information can be compared 
across wavelengths and across models, can be similarly il- 
luminating. Evolution has made humans amazingly good at 
pattern recognition, and this paper is about how analysis 
techniques that marry humans' extraordinary visualization 



capabilities to statistical principles are, and should continue 
to be, on the rise within modern astronomy^] 



2 Data-Dimensions-Display 

There are three simple words to keep in mind when one 
sets out to explore and/or explain high-dimensional infor- 
mation with visualization: data, dimensions, and display. 
Any data set containing the equivalent of more than two 
columns worth of information can be thought of as "high- 
dimensional." In some cases, the dimensions may be spatial 
or temporal, but in other cases the dimensions might be just 
columns in a data table, so a "high-dimensional" space can 
be highly abstract^] 

Consider Figure [T] which shows a simple carte- 
sian graph documenting attendance at Astronomische 
Gesellschaft (AG) meetings over time. The data used to cre- 
ate this graph are from the AG websiter] which contains a 
table with 8 columns, listing: Year, RGjA|/[ City, Date, Num- 



Hassan & Fluke 



feoil} recently published a uniquely comprehensive 
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review of the recent history of visualization in Astronomy, and the inter- 
ested reader is referred to that work for details and links to software not 
pro vided here. 

2 I Wong & Bergeron \ 1997} provide an excellent review of multi- 
dimensional multi-variate visualization that includes a good discussion of 
the meaning of the word "dimensions" within various disciplines 

3 www . astronomische- gesellschaft . org/en/tagungen 

4 An index number on "regular" meetings of the General Assembly. 
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Fig. 1 History of the AG meeting. 



ber of Members, Number of Attendees, Number of Lec- 
tures, and Number of Posters. Thus, this data set has at least 
8 dimensions (and more if locations' GPS coordinates were 
to be used in lieu of placenames). 

In order to best convey meaning to a particular audience, 
one needs to consider the display mode that can and will be 
used. For example, when I presented Figure [T] at the 2011 
AG meeting, to a group of astronomers, I used slideware 
on a data projector. Even though I could have created and 
shown a kooky unconventional display (e.g. a time-lapse 
movie showing a world map where spinning graphs rep- 
resenting ratios of attendees/talks/posters at AG meetings 
float above relevant cities), I knew that my audience would 
not expect or understand such a display. So, I chose in- 
stead a standard x-y time- series- style graph, where dimen- 
sions {number of members, number of attendees) are plot- 
ted as number vs. time and where a calculated diagnostic, 
percentage of members attending is shown using an addi- 
tional (right-hand) y axis, but a shared x-axis (time). Partial 
information from one additional dimension is also shown, 
since yellow-highlighted points indicate locations outside 
of Germany. Thus, 4+ dimensions (three tabulated, one cal- 
culated, one partial) are shown in a 2-dimensional display. 
Context from beyond the online table is added to this dis- 
play in the form of labeled grey bands showing the duration 
of the two world wars, which explain gaps in the series of 
meetings. Subtle stylistic choice^] about display are also 
made, so that, for example, attendance numbers are shown 
as a series of vertical lines connecting dots to the zero-line, 



looking a bit like a histogram made of "headed" symbols. 
The graph is labeled within its borders, so as to avoid the 
need for an extensive caption. 

The AG meetings example in Figure [T] offers a very 
specific, time- series-based, example of Data-Dimensions- 
Display principles, but deeper value is to be had when D-D- 
D is considered as a more general construct. Figure [2] shows 
a cube representing an abstract three-dimensional space. 
In Astronomy, and most other sciences, data are often ac- 
quired as a function of many dimensions (e.g. intensity as 
a function of space, time, wavelength, etc.)|j2?Mf, subsets 
of those data are usually only displayed and analyzed along 
one or two dimensions at a time (e.g. as a spectrum showing 
intensity as a function of wavelength). 

Consider the color-coded examples listed in the grey 
box associated with Figure [2] In Astronomy, intensity as a 
function of one (non-spatial) dimension is most frequently 
thought of and displayed in an x-y graph as a spectrum, an 
SED, or a time-series. Intensity as a function of two (spatial) 
dimensions often is appropriately thought of and displayed 
as an image or contour map. In many cases, such as in maps 
of spectral-line emission or layers of data at multiple wave- 
lengths, images or contour maps can be contextualized as 
"slices" through a higher-dimensional (3D) space that forms 
what is typically called a "data cube." 

The set of display modes for seeing all the data in a 
cube is growing and presently features static 3D renderings, 



stereoscopic display, and interactive representations ( |Has 
san & Fluke 201 1). In cases where it is possible to generate 



5 The works of Edward Tufte (e.g. Tufte|200lj are an excellent gen- 
eral resource concerning how to optimize visual displays of quantitative 
information. 



6 In commercial data analytics and statistical analysis systems, high- 
dimensional "hyper-cubes" are commonly analyzed and visualized using 
"OLAP" (online analytical processing) technologies. 
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Fig. 2 Data "Cubes" in Astronomy. 

a series of data cubes as a function of some fourth dimension 
(usually time), 3D animations and/or sets of small multiples 
(repeated versions of 3D views seen side-by-side) are often 
used for display]^] 

Software for analyzing observations and simulations is 
constantly growing more capable due to increased compu- 
tational performance. But, even the most modern astronom- 
ical software packages still do not make enough use of ex- 
plicit connections between the dimensions inherent in a data 
set. Instead, various kinds of displays (x-y graphs, images, 
volume renderings) are created separately, using tools that 
do not link common dimensions across all active plots. Be- 
low, I explain a "linked view" approach that is likely to be- 
come the essential path to insight as astronomical data sets 
continue to expand in complexity and size. 

3 Linked Views 

Live linking of views across display modes holds the key 
to effective visualization and analysis of high-dimensional 
data sets (c f . |Gresh et al.|2000| |Tukey|1977[ |Wong & Berj 
eron |1997) . Figure [3] shows a cartoon where four types of 



graphical display of a high-dimensional data set are pic- 
tured with one data subset highlighted in red. In an effective 
"linked view" visualization system, the kind of highlight- 
ing the red coloring represents is done interactively, in real 
time, and the selections made can be saved and combined 
with other selections for use in analysis]^] 

When researchers can easily investigate the behavior of 
trends and outliers in all the dimensions of data at hand they 
will learn more about the information at hand as a result. 
(Conversely, if it is not easy to carry out exploratory investi- 
gations, researchers will often stop analysis at a stage where 
key insights will remain hidden within a high-dimensional 
data set.) For an astronomical example, imagine that a par- 
ticular group of points in an x-y plot of flux vs. velocity 
appeared to have aberrant behavior. In a linked- view sys- 
tem, a user could immediately highlight, select, and option- 



7 There are many excellent software packages capable of achieving 
beautiful visualizations of high-dimensional real and simulated data, nearly 
all of which are explained and listed in the recent review by Hassan & Fluke 
( 201 1 ). Here, I have chosen to focus instead on ideas about how to link the 
information in visualizations amongst otherwise-hidden dimensions and 
aspects of a data set. 



ally include/exclude those points from display and analy- 
sis amongst other dimensions, for example in a plot of ve- 
locity vs. signal-to-noise ratio, which might show the aber- 
rant points to have low significance. Extrapolating from this 
simple example, one can imagine and appreciate the power 
real-time linked views offer for making more sophisticated 
investigations, such as data selection based on behavior seen 
in a combination of several dimensions. In Astronomy, the 
ability to interactively explore the connections between data 
points in statistical graphs and the same measurements' po- 
sitions in "real" 3D space, and vice-versa, is particularly 
powerful. 




Fig. 3 Linked Views (figure created by M. Borkin) 



In the realm of point-based data (e.g. ASCII tables), 
the benefits of interactive linked views were first explored 
by John Tukey and his colleagues using the PRIM-9 sys- 
tem they developed in the 1970sj^] No readily-available 
computers 40 years ago had input devices that could be 
used to graphically select subsets of data, so Tukey 's 
team had to design a custom visualization control box 
with many buttons all of which had special selection- and 
manipulation-oriented fun ctions p*| Tu key 's ideas on Ex- 
ploratory Data Analysis (Tukey| [1977| ), including princi- 
ples he called "picturing," "rotation," "isolation," "brush- 
ing," and "masking," were first implemented commercially 
in 1986 in the Macintosh-only program DataDesk, which is 
still in use today on Macs and PCsp] 

LIST 1 gives a summary of the mainstream commercial 
descendants and offshoots of the Exploratory Data Analysis 
principles espoused by Tukey. These are powerful tools for 
exploring tabular data on its own, but none of them links 
image-based or image-cube-based information to catalog 



9 See 

10 An 



Friedman & Stuetzle 



2002 



excellent demonstration 



for a review, 
video showing 



PRIM-9 is at 



8 see |www . kitware . com/ InfovisWiki /index . php/ 

Linked_Views and references cited there for more information 



Is tat -graphics . org/movies/prim9 . html . I 

11 In 1986 the Macintosh operating system, then two years old, was the 
only widely-available computer with a mouse-driven graphical user inter- 
face needed to make the PRIM-9 ideas practicable. 
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(tabular) data, which is the key missing link in astronomical 
data analysis today. 

LIST 1: Commercial Linked View Software for 
Analyzing Tabular Datapl 

DataDesk, est. 1986 



|www . datadesk . com] inspired by John Tukey's and Paul 
Velleman's work on "Exploratory Data Analysis", see 
Friedman & Stuetzle (2002) for a review 

Spotfire, est. 1996 

|spot f ire . tibco . com} inspired by Chris Ahlberg's and Ben 
Shneiderman's ideas about interactive data display, see 
|www.cs .umd.edu/hcii/spotfire/| and references therein, 
including Ahlberg & Shneiderman (1994]) 



Tableau, est. 2003 

|www . tabieausof tware . com| inspired by Chris Stolte, Diane 
Tang, and Pat Hanrahan's work on "Polaris" and VizQL (Vi- 



sual Query Language), see Stolte et al. (2002) 



Microsoft Business Intelligence ("BI"), est. 2000's 

|www.microsoft . com/ en- us/bi/ default . aspx| inspired by 

extensions to Microsoft's SQL database services and Excel 
spreadsheet (in the form of "PowerPivot" add-on) 





Fig. 4 Tableau Samples 



Figure]?] shows a screenshot of just a few of the kinds of 
graphs that Tableau and its ilk produce. Color is used to sub- 
set and link points shown in multiple displays, and subsets 
of particular colors can be defined graphically or algorith- 
mically, and they can be saved. Pre-made maps like the one 



shown in the top panel can be used as backgrounds and pre- 
defined bounded regions (e.g. US states) can be used as se- 
lection facets-but new boundaries within an image (known 
as new "segmentations") cannot be easily added. 

Similar linked-view software packages for exploring 
tabular data are available in the Open Source community 
(LIST 2). These often have less intuitive or polished graph- 
ical interfaces, but they may become exceptionally useful 
as flexible, statistically-sophisticated, modules that can be 
integrated into a set of inter-operable tools as discussed in 
Section 4. 

LIST 2: Sample Open-Source and/or Free Linked View 
Software for Analyzing Tabular Data 



ggobi: |www.ggobi .org cf. the "rggobi" package in 



R/CRAN |cran .^^pr^ject . org/web/packages/rggobi| 



Weave 13 



Mondrian: |stats .math . uni-augsburg . de/Mondrian| 

|astrophysics . arc . nasa . gov/ ~pgazis/| 



www . oicweave . org 



Viewpoints: 

|viewpoints . htm| 



XmdvTool: |http: //davis .wpi . edu/xmdvl 

TOPCAT: www .star.bris.ac. uk/ ~mbt /topcat 

ViVA Workbench: |http : //iplant-viva . sourcef orge . | 

ETetTl 

TITAN: |www . kit ware . com/ Inf ovisWiki/ index .php/| 
|Main_Page| 

In geography and demographics, so-called "GIS" or 
"Geographic Information System" tools such as ESRFs Ar- 
cG/4n an d Pitney Bowes' Maplnfo Professional^ and En- 
gage 3D Pro^\ offer powerful linked-view systems where 
maps are used as layers. Importantly, though the maps them- 
selves are not typically treated as data pixel-by -pixel so that 
selection within a map is usually along pre-defined region 
boundaries, making the selection and extraction of map- 
based data for an arbitrary user-selected region less than 
fully straightforward. 

So, are there any working robust tools that offer image- 
and cube-savvy linked-view visualization and analysis envi- 
ronments? Sort of. In the early 2000 's there were two no- 
table attempts to implement an image and/or cube-enabled 
linked view tool: WEAVE at IBM ( |Gresh et aLpOOO] ) and 
MIRAGE at Bell Labs/Lucent JHo|2003) . 

WEAVE was developed in Bernice Rogowitz' group at 
IBM Research, to support a collaboration between computer 
and cognitive scientists with medical researchers. It comes 
the closest to a system that would be perfect for the anal- 
ysis of high-dimensional astronomical data (see Figure [5]). 
Unfortunately, though, the IBM WEAVE project's software, 



13 Note that this "Weave" is not the same as the WEAVE program devel- 
oped at IBM and described in |Gresh et a l. ( 2000 1 

14 |www . esri . com/ sof tware/ arcqis/| 



12 An interesting comparison of the last three services, and the simi- 15 www.pbinsight.com/products/ 
lar "QlikView" software (glikview . com) is at[www . pract icaldb . | |locat ion- intelligence?]"" 
|com/blog/ data- visualization- comparison] 



16 www . encom . com . au/ tempi ate 2 . asp?pageid=14 9 
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built linking Data Explorer and Diamond (a precursor to 
Opal/ViVA, see LIST 2) via ActiveX, is no longer supported 
or available. 
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Fig. 5 Screen Shot of "WEAVE" in action. Colored se- 
lections can be made in any of the 2D analysis panels, or 
directly in the high-dimensional (3D) display, and all views 
are live-linked. From |Gresh et al.| ( |2000| ). 



MIRAGE was developed by visualization and statistics 
researchers collaborating with astronomers, directly for use 
in Astronomy, and it has been integrated to some extent 
with Virtual Observatory standards (Carliles et al.||20Q4 



Ho,T.K.||2Q07] ). As of this writing, MIRAGE can be ac 



quired at skyservice .pha . jhu . edu/develop/vo/mirage/ 



but its VO functionality is presently somewhat fragile. Fur- 
thermore, MIRAGE does not allow region selection (seg- 
mentation) within images like WEAVE did, and it does not 
presently handle data cubes. 

In spite of their limitations, WEAVE and MIRAGE 
demonstrated the potential of exploratory data analysis tools 
that understand 2D and 3D images. Yet, since WEAVE is no 
longer available, and MIRAGE's image-based-information 
linking is limited, neither offers a full linked-view solu- 
tion to astronomical researchers today. More recent efforts 
built on top of visualization toolkits like VTK (discussed 
below), are presently extending the image-enabled linked- 
view paradigm that WEAVE and MIRAGE pioneered. 

At present, choices open to astronomers seeking to 
implement high-dimensional linked-view visualization and 
analysis into their research can be categorized into the four 
kinds of approaches itemized in LIST 3. Examples given in 
the list are discussed in turn, below. 

LIST 3: Approaches to High-Dimensional Image- and 
Cube-Aware Linked View Visualization in Astronomy 



1 . Use existing high-level visualization and analysis pack- 
ages that satisfy astronomy- specific requirements, such 
as IDL, to implement custom linked-view tools for spe- 
cific problems. Example: Dendroviz. 

2. Use resource-hub and/or message-passing architectures 
to inter-connect software packages in a way that they 
can link their views to a limited extent. Example: S AMR 

3. Adapt capabilities from software systems from beyond 
Astronomy. Example: Astronomical Medicine. 

4. Build a new extensible system, preferably based on 
open-source, re-usable, modules. Examples: Glue, Par- 
aview, Titan. 



3.1 Custom Solutions within Existing Software, e.g. 
Dendroviz 

The screenshot in Figure [6] shows a linked-view display of 
a spectral-line data cube. The "Dendroviz" (a.k.a. "Cloud- 
viz") software used to create the views was written inside of 
IDL^Jlt is freely availably to IDL users, and was written 
by Ph.D. student Christopher Beaumont for his thesis work 
at Harvard. Many of the desirable aspects of linked views 
discussed above, and schematized in Figure [3] are incorpo- 
rated here. The tree-like diagram at upper left in the figure 
shows a hierarchical decomposition of the spectral-line in- 
tensity within a 3D (position-position-velocity) cube. The 
x-y plot at lower right shows another physical diagnostic of 
the gas, and the two other panels show volume visualiza- 
tions and slice views of the data. Linking is possible by se- 
lecting in any 2D analysis plot (e.g., tree, x-y) and then see- 
ing selections as colored regions within the 2D (slice) and 
3D (volume) data displays. Selections can be saved, com- 
bined, and output as filters p*| 

Thus, it is possible within a general-purpose program 
like IDL to design a custom linked-view environment. But, 
this approach has some serious limitations. First, it is dif- 
ficult or impossible to make arbitrarily- shaped selections 
within the image-based environment. And second, the func- 
tional and aesthetic qualities of the user interface and visu- 
alization layouts here are not very good, and they cannot be 
improved when one is restricted to using only IDL. 

3.2 Hubs and Message Passing Amongst Disparate 
Programs, e.g. SAMP 

A much more general approach to linking views of astro- 
nomical data is offered by SAMP, a message-passing archi- 
tecture developed by Mark Taylor and colleagues within the 
International Virtual Observatory Community)^] 



|w ww. exelisvis . com/ Product s Serv ice s / IDL . aspx 



at code . google . com/p/cloud-viz/ 

19 Videos demonstrating Dendroviz functionality and usage are 
online at projects.iq.harvard.edu/seamlessastronomy/ 
software/dendrograms. 

2U The SAMP standard is described at |www . ivoa . net/1 
[Documents /SAMP] 
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Fig. 6 Example: Screen Shot of "DendroViz" project, courtesy C. Beaumont. 



Figure [7] shows a screen shot of SAMP in action. At 
the upper left, an Aladirp] window is open showing the 
cluster NGC7023, with several catalog sources overlaid. 
The same region of the sky and catalog data are shown in 
Worldwide Telescopq^j (upper right), and the catalog data 
are shown in TQPCATp^] which can manipulate those data 
in a statistical-graphics environment not unlike DataDesk 
(lower left). Other popular astronomical analysis environ- 
ments, like ds<Q can also connect to SAMP, but are not 
shown in this example. 

So, what does SAMP do? When applications "connect" 
to the SAMP hub, as they were during the session captured 
in Figure[7J they pass simple messages amongst themselves, 
telling each other what coordinates and field of view are 
currently being used, and what catalog sources are selected 
and sub-setted. Thus, a savvy user can run SAMP to effec- 
tively link views, bringing the functionality of several pro- 
grams to bear on the same data set at once, in a concerted 
way. Other than the screen real-estate challenge posed by 
the need to keep track of the (many!) windows open while 
SAMP connects disparate applications, the major limitation 
of the SAMP system at present is the lack of tools to select 
arbitrary regions within an image, and link such selections. 

The good news is that SAMP has recently been web- 
enabled, so that java and web-based applications can now 
be connected within a fully online environment. 



aladin . u- strasbq . f r 



worldwidetelescope . org 



3.3 Adaptation from Beyond Astronomy, e.g. 
Astronomical Medicine 

Astronomy is not the only field faced with the challenge of 
incorporating high-dimensional information into quantita- 
tive analyses: geography, medicine, biology, and other fields 
share similar challenges. The overlap of methods used in 
these fields, especially in astronomical and medical imaging 
and analysis, is far greater than one might imagine at first. 
Over the past five years, a group of us at Harvarcp] have 
been exploring the efficacy of directly adapting tools de- 
veloped for medical imaging into the astronomical research 
environment (e.g. Bork in et al.|2 007). 

It is clear the the high-dimensional visualization and 
manipulation tools available in the medical community, 
largely based on the VTK and ITK toolkits (discussed fur- 
ther in §3.4), are far superior to those typically available 
to astronomers. Figure [8] shows an example of the use 
of 3DSlicer, a program developed in part at the Surgical 
Planning laboratory of Brigham and Women's Hospital in 
Boston, used to view data about a star-forming region. No- 
tice that Figure [8] shows multiple 3D spectral-line data sets 
at once, and a moveable (black and white) 2D plane show- 
ing a 2D dust image is incorporated as well. Our group at 
Harvard's Initiative in Innovative Computing managed to 
write a converter (fits2itk) aware of astronomical coordinate 
systems to move FITS images into the ITK formap) but 
preserving more than astronomical metadata beyond coor- 



www . star . bris . ac . uk/ ~mbt /to pcat 



he a- www . harvard. edu/RD/ds9 



25 


am . iic . harvard . edu 




26 


available at 


am. iic . har 


vard . edu/F ITS- reader 
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Fig. 7 Example: Screen Shot of SAMP-connected Applications 



dinates was not trivial in the medically-optimized 3D Sheer 
environment. 

Most importantly, while medical tools can offer great vi- 
sualizations, they do not typically implement linked views 
of tabular data that can interact with volume- and slice- 
based visualizations. Thus, for now, it is necessary to sep- 
arate high-performance visualization work from statistical 
analyses using medically-optimized systems, but we expect 
that situation to change in the near future-and we look for- 
ward to trying out more software developed for other fields 
within the astronomical context. 




DEC 



Velocity 



Fig. 8 "Astronomical Medicine" view of L1448, created 
using 3D Sheer by Jens Kauffmann. Similar figures were 
published as the first interactive 3D PDFs in the journal 
Nauture ( [Goodman et al.|2009"] ). 



3.4 New Solutions in Open Source Environments, e.g. 
Glue, Paraview, Titan 



More than a decade ago North & Shneiderman ( 2000 ) in- 
vestigated the idea that non-programing users could "snap 
together" visualization modules on-the-fly to create what- 
ever custom linked- view environment would best address a 
particular problem. The Dendroviz solution discussed above 
is an implementation of this approach within IDL, but it re- 
quires a programming- savvy user. 

With the ascension of python as the preferred modern 
programming language within Astronomy and other fields 
of science, there has been an explosion in the amount of 
open-source code available to researchers for re-use. (See 
and|www . scipy.org] ) Several graph- 



www . astropython . org 



ics and table-manipulation packages are already available, 
and many of them even understand astronomical coordi- 
nate systems and units(!). Similarly, the statistical analy- 
ses available within the R language (and the accompanying 
CRAN package^} can be interconnected to create nearly 
any needed analysis. If these packages can be "glued" to- 
gether effectively, then it should be possible, even for non- 
programming users, to create a linked- view visualization 
and analysis environment using primarily free python-based 
and R-basecj^]tools in the very near-term future. 

A group of us (Christopher Beaumont, Michelle Borkin, 
Thomas Robitaille, Hanspeter Pfister and me) are actively 
working on a new python-based linked- view visualization 
system code-named "Glue." We have already created a hub 
that allows various python modules to be "linked" without 
their code being merged. Currently, we are working on the 



http : //cran . r-pro ject . org 



The Rpy tools at rpy . sourcef orge . net 



accessed from within python. 



allow R functions to be 
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user interface, and ultimately our plan is to connect Glue 
to R and to SAMP-enabled applications, which would of- 
fer a true "snap together" linked- view visualization environ- 
ment for Astronomers, and for other researchers. Glue will 
be fully open-source, available , and we more than welcome 
collaboration from the community in this endeavor^] 

Glue should be able to build upon and extend impor- 
tant packages that use the Visualization Tool Kit (VTK0 
as a scientific visualization platform. The 3DSlicer pro- 
gram used in the Astronomical Medicine project is an ex- 
ample VTK being used to create a sophisticated medical 
visualization system. More general efforts, such as Par- 
aviewp] are applicable to many non-medical data formats, 
including astronomical simulation outputs (but not yet ob- 
servational formats that use astronomical coordinates). And, 
most promisingly, the collaborative Titan effort J^] marries 
VTK-based scientific visualization to information and sta- 
tistical visualization modules J^] including those from open- 
source efforts such as R/CRAN. 

4 Seamless Astronomy: A Vision for the 
Future 

To understand what w^Jmean when we say we strive for 
"seamless" astronomical research, imagine thisjf] 
A smartphone application featuring interesting new astron- 
omy images shows you the inset image in the middle of Fig- 
ure^ You have wireless connectivity and some kind of large 
display handy, and you are curious to know more. First, you 
flick the image off your phone to your large displayp^Next, 
you find out where this image belongs on the sky, using a 
recognition service that either examines embedded meta- 
data in its headeipj or its contenQ Now you use VO ser- 
vices embedded in any numbe r of applications, for e xample 
the Worldwide Telescope ^ Goodman et al. 2012| , to put 
this image in context, allowing you to view how it looks in 
comparison to extant images at many wavelengths]^] Fig- 
ure [9] shows the result of uploading this image to the "as- 



29 



See [projects . iq . harvard . edu/ seamless astronomy/ 



software for more information. 

3U www . kitware . com/products/books . html 



|www . paraview . org| 

www . kitware . com/ Inf ovisWiki/ index . php/Main_ 



Page 



55 There is a growing class of such efforts, including the Mayavi project 
(?), which combines VTK and Python in a modular, extensible, fashion. 



projects . iq. harvard. edu /seamless astronomy/ 



37 Possible using AVM tags, see virtualastronomy.org/avm_ 
[metadata .php| 

38 [astrometry . net| can find the position of any image just based on 
the pattern of visible stars it contains 



39 



patter 



worldwidetelescope . org 



40 Presently in WWT, one can locate an image based on a FITS header, 
AVM header, from metadata passed from astrometry.net ( Lang et al.|2010) , 
via flickr www . f lickr . com/groups /astrometry/ or by register- 
ing features by hand. The WWT view shown in Figure[9]can be recreated at 



trometry" group on flickr, and then selecting the "View in 
Worldwide Telescope'' link that appears in the comments 
on the resulting pagq^ja few minutes later, and then chang- 
ing the background view to show the latest WISE infrared 
imagery. You're wondering about the young- star population 
in the area, so you first use the VO- searching capabilities 
built in to WWT to add an overlay of 2MASS sources (not 
shown here, due to high density of such sources!), and then 
later you connect WWT to other astronomical and visual- 
ization applications using SAMP and Glue. You're curious 
if there are any molecular-line maps of this region, so you 
use the features built-in to WWT and ADSLabfEl to find 
and display a list of all the papers that mention "data cubes" 
and study this region. One of them has a great map of 13 C0 
emission in the Perseus, and you want to see the 2D im- 
ages in Figure [9] and the catalogs you have retrieved online 
in the context of those 3D maps. You're lucky and the per- 
son publishing the CO map included persistent hdl tags in 
her paper that lead you to a ' 'Dataver se' |^] online repository 
at |theastrodata.org] where you can retrieve and/or link 
to the data cubep] Now, you call upon the capabilities of 
Glue to display and analyze a live-linked combination of: 
the 3D spectral-line data; moveable planes that hold the im- 
agery shown in Figure [9j the catalogs you've linked to via 
SAMP; and a calculated "dendrogram" decomposition of 
the 3D data you calculated using a module within Glue. 

Using exploratory data analysis and linked views, you 
begin to notice correlations and outliers amongst the various 
dimensions of data you've displayed. It's tricky to make and 
explore some of the selections you want to make within the 
3D volumes, so you use your hands in the air, as sensed by 
a high-dimensional pointing device]^] to make those selec- 
tions. It seems that there are big shells within the CO data 
set that seem associated with young stars. How young are 
the stars? You go to the |astrobetter . com| site and discover 
a new algorithm, written in R, that offers better estimates 
of young stars' ages. So, you download that algorithm, and 
you kindly decide to make this new algorithm part of Glue 
by using Rpy, the Python interface to R^]to create a small 
Python program that uses R's statistical power to analyze 
the information about the young stars. When you're done 
with your analysis, you: 

1. publish your new Python-based young star age mod- 
ule to the Glue code repositories online (e.g. Github, 
Sourceforge) 

2. publish a paper in a Journal about your findings, includ- 
ing persistent identifiers to the data used in each graph 



Footnotes in this section offer live links to what is possible now. 

Already possible using, for example, AirPlay from Apple. 



|tinyurl . com/ seeperseus , by zooming out a bit and then selecting 
WISE from the Collection "All Sky Surveys" as the background imagery. 

41 |www. flickr .com/photos/ 6 64 9 67 9@N00/ 67 91 64 982 9/| 



adslabs . org 



5 t hedata . org 

44 at |tinyurl . com/ completel3C0per| 



At present, the Microsoft Kinect is a good, albeit low-resolution, ex- 



45 

ample of such a device. The Leap www . leapmotion . com may be the 
next, higher-resolution step. 

46 |rpy . sourceforge . net/| 
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and/or analysis shown (e.g. using the Dataverse archi- 
tecture at theastrodata.org) 
3. include interactive figures in your Journal article, and 
on your web site, allowing others to explore your data 
further (similar to the interactive 3D PDF published 
by IGoodman et al.| ( |2QQ9| ) in Nature in 2009), which 
you create following the free instructions by Josh Peek 
posted on astrobetter.com|^1 

As the footnotes demonstrate, about 90% of this sce- 
nario is possible now, even though astronomers are not 
typically aware of all of the tools that make it possible. 
It's the last 10%, which includes implementing a Glue- 
like solution and creating effective 3D interaction tech- 
niques, that stands between now and a seamless future 
of fully linked- views in high-dimensional visualization. 



5 Challenges 

Visualization researchers have been working to optimize 
high-dimensional linked-view visualization for nearly 40 
years. Many systems exist that are great for point-based 
data, but this paper demonstrates that none of these systems 
yet addresses astronomical image- and cube-based data sets 
adequately. The main challenges to implementing a sys- 
tem for viewing, manipulating, and inter-comparing high- 
dimensional astronomical data sets in a linked-view envi- 
ronments at present are: 

1 . Big data-today's laptops can easily handle data sets like 
any of the ones used as examples in this paper. But, in- 
struments like ALMA and integral field units, and big 
numerical simulations, generate data sets far too large 
to manipulate within current computing architectures. 
Additional research is needed on how to most effec- 
tively retrieve and load subsets of information into a 
computer's memory, so as to still allow real-time ex- 
ploration and manipulation of even the largest data sets. 
Clever structuring of data sets using new databases like 
SciDEpj and the continued evolution of MapReduce 
and Hadoop may help, but work will still be needed to 
optimize remote "real-time" access to subsets of very 
large data sets. 

2. Interface design-it is difficult to avoid complicated 
menus and too many open windows. The cognitive load 
a very flexible system places on a user can be much 
greater than a more rigid system, so "snap together" cus- 
tomizable tools can ultimately cause confusion if imple- 
mented poorly^Thus, it will be critical to study a range 
of user interface options, and match those options well 
to user's needs, and to their equipment. Ten open win- 
dows may be fine if one has a giant monitor, touch table, 
and/or display wall, but a system that requires all those 
windows will not likely work well on portable devices. 



It is also critical, and difficult, to design a system that 
best supports problem solving without overwhelming a 
user with options. 

3. 3D selection-mice, trackpads, and touch screens have 
evolved over the past 30 years to offer very good options 
for selecting regions of a two-dimensional screen. But, 
research into 3D selection has barely begun|^] Human 
hands cannot be moved as steadily in 3D free space as 
they can be on 2D surfaces, so while Kinect-like devices 
offer inroads, they may not immediately offer optimal 
solutions. 

4. Diversity of challenges-the examples of astronomi- 
cal research challenges used in this paper are but a 
tiny fraction of the range of problems researchers will 
bring to a system for linked-view visualization of high- 
dimensional data. 

I predict that these four challenges, and others not yet 
anticipated, will be met through a combination of three 
trends that we can see emerging already. 

1. Modularity-As mashup-style software solutions be- 
come more and more prevalent on the web today, there 
is every reason to expect that an approach where expert- 
developed modules, each aimed at addressing a particu- 
lar needs, can be "glued" together effectively if appro- 
priate attention is paid to standards and compatibility. 

2. Open source collaborative software-The number of 
astronomers, visualization researchers, and generally 
generous coders who seem interested in helping to de- 
velop useful code for research and visualization is con- 
stantly growing. This growth combined with the increas- 
ing ease with which coders can share their work, thanks 
to platforms like Github, Google Code, and Source- 
forge, should make it possible for an ever-widening 
range of talent and ideas to be brought to bare on these 
challenges. 

3. Interdisiplinary collaboration-The Astronomical 
Medicine project offers just one demonstration that the 
need for linked-view visualization of images and data 
cubes is shared across fields. As the number of fields 
faced with high-dimensional visualization challenges 
expands, so will funding for and work on this problem. 

Thus, it appears that the time is ripe for Astronomers to col- 
laborate beyond our field's traditional boundaries in order 
to create a modular open-source high-dimensional linked- 
view exploratory data visualization environment. 
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available at tinyurl . com/peek3dpdf " 


48 


www . scidb . orq 






49 


see Rogowitz & Matasci 


2011 





50 See the work of Daniel Keefe et al. for good examples of what's 
presently possible, at |ivlab . cs . umn . edu/pro ject_3dui .php| 
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